Introduction

A heatmap is a graphical representation of data that uses a system of color-coding to represent different values. Heatmaps are used in various forms of analytics, however, this R package specifically focuses on providing an efficient way for creating interactive heatmaps for categorical data or continuous data that can be grouped into categories.

This package is originally beeing developed for Verkehrsbetriebe Zürich (VBZ), the public transport operator in the Swiss city of Zurich, to illustrate the utilization of different routes and vehicles during different times of the day. Therefore, it groups utilization data (e.g. persons per m^2) into different categories (e.g. low, medium, high utilization) and illustrates it for certain stops over time in a heatmap.

This package can easily be integrated into a shiny dashboard which supports additional interactions with other plots (e.g. boxplot, histogram, forecast) by using plotly events. A mini-demo app is provided in a seperate github repository named catmaply_shiny.

This work is based on the plotly.js engine.

Please submit feature requests

This package is still under active development. If you have features you would like to have added, please submit your suggestions (and bug-reports) at: https://github.com/yvesmauron/catmaply/issues

News

You can see the most recent changes of the package in NEWS.md.

Installation

To install the latest (“cutting-edge”) GitHub version run:

# make sure that you have the corrent RTools installed.
# as you might need to build some packages from source
# if you don't have RTools installed, you can install it with:
# install.packages('installr'); install.Rtools() # not tested on windows
# or download it from here:
# https://cran.r-project.org/bin/windows/Rtools/
# in any case, make sure that you select the correct version, 
# otherwise the installation will fail.
# then you'll need devtools
# if (!require('devtools'))
  # install.packages('devtools')
# finally install the package
# devtools::install_github('yvesmauron/catmaply')

To get the latest version on CRAN, perform:

#install.packages("catmaply")

Thereafter, you can start using the package as usual:

library(catmaply)

Usage

Get demo data provided by package.

data("vbz")
df <- vbz[[3]]$data

knitr::kable(head(df, 10))
halt_seq fahrt_seq Haltestellenlangname Plan_fahrt_Id LiKu_Name Linienname FZG Besetzung Ausl_Kat FZ_AB
1 7 Zuerich, Milchbuck 42146 3 83 GB 14.81250 1 06:58:30
1 50 Zuerich, Milchbuck 59126 17 83 GB 12.50000 1 19:13:00
1 36 Zuerich, Milchbuck 25787 11 83 GB 18.50000 2 17:27:30
1 37 Zuerich, Milchbuck 31452 12 83 GB 16.96667 2 17:35:00
1 3 Zuerich, Milchbuck 64324 7 83 GB 6.96226 1 06:30:00
1 8 Zuerich, Milchbuck 47597 4 83 GB 17.36842 2 07:06:00
1 49 Zuerich, Milchbuck 53513 16 83 GB 13.34211 1 19:05:30
1 38 Zuerich, Milchbuck 37033 13 83 GB 14.95238 1 17:42:30
1 41 Zuerich, Milchbuck 53511 16 83 GB 11.04255 1 18:05:00
1 6 Zuerich, Milchbuck 37041 2 83 GB 14.52500 1 06:51:00

The main columns of the vbz data.frame can be described as follows:

So let’ visualize it.

catmaply(
    df,
    x='fahrt_seq',
    y = "Haltestellenlangname",
    y_order = "halt_seq",
    z = "Ausl_Kat"
  )

How about differences in one category, e.g. one colorbar per category. Also, let’s take another color palette (magma).

To show a colorbar per category, we have to put a continuous number in the fields and categorize it with a categorical column, so in our example:

To change the color palette you can either submit a color palette vector or a function that is able to return one.

Note: that the color palette function needs to take n as first argument, whereas n defines the number of colors to be produced.

catmaply(
    df,
    x='fahrt_seq',
    x_order = 'fahrt_seq',
    y = "Haltestellenlangname",
    y_order = "halt_seq",
    z = "Besetzung",
    categorical_colorbar = T,
    categorical_col = 'Ausl_Kat',
    color_palette = viridis::magma
  )

Now, lets mess around with axis formatting; lets change

catmaply(
    df,
    x='fahrt_seq',
    x_order = 'fahrt_seq',
    x_tickangle = 15,
    y = "Haltestellenlangname",
    y_order = "halt_seq",
    z = "Besetzung",
    categorical_colorbar = T,
    categorical_col = 'Ausl_Kat',
    color_palette = viridis::magma,
    font_color = '#6D65AB',
    font_size = 10
  )

How about differences in one category, e.g. one colorbar per category. Also, let’s take another color palette (magma).

To show a colorbar per category, we have to but a continuous number in the fields and categorize them with a categorical column, so in our example:

To change the color palette you can either use submit a color palette vector or a function that is able to return one.

Note: that the color palette function needs to take n as first argument, whereas n defines the number of colors to be produced.

catmaply(
    df,
    x='fahrt_seq',
    x_order = 'fahrt_seq',
    y = "Haltestellenlangname",
    y_order = "halt_seq",
    z = "Besetzung",
    categorical_colorbar = T,
    categorical_col = 'Ausl_Kat',
    color_palette = viridis::magma
  )

Now, lets mess around with axis formatting; lets change

catmaply(
    df,
    x='fahrt_seq',
    x_order = 'fahrt_seq',
    x_tickangle = 15,
    y = "Haltestellenlangname",
    y_order = "halt_seq",
    z = "Besetzung",
    categorical_colorbar = T,
    categorical_col = 'Ausl_Kat',
    color_palette = viridis::magma,
    font_color = '#6D65AB',
    font_size = 10
  )

What about a custom hover label; lets define a custom hover template by defining the parameter hover_template.

catmaply(
  df,
  x=fahrt_seq,
  x_order = fahrt_seq,
  x_tickangle = 15,
  y = Haltestellenlangname,
  y_order = halt_seq,
  z = Besetzung,
  categorical_colorbar = T,
  categorical_col = Ausl_Kat,
  color_palette = viridis::inferno,
  hover_template = paste(
    '<b>Fahrt Nr.</b>:', fahrt_seq,
    '<br><b>Haltestelle</b>:', Haltestellenlangname,
    '<br><b>Auslastung</b>:', Ausl_Kat,
    '<br><b>Besetzung</b>:', round(Besetzung, 2),
    '<extra></extra>'
  )
)

Define custom names for the legend by setting the legend_col parameter.

df <- df %>% 
  mutate(
    legend_col = paste("Kategorie", Ausl_Kat)
  )

catmaply(
  df,
  x=fahrt_seq,
  x_order = fahrt_seq,
  x_tickangle = 15,
  y = Haltestellenlangname,
  y_order = halt_seq,
  z = Besetzung,
  categorical_colorbar = T,
  categorical_col = Ausl_Kat,
  color_palette = viridis::inferno,
  hover_template = paste(
    '<b>Fahrt Nr.</b>:', fahrt_seq,
    '<br><b>Haltestelle</b>:', Haltestellenlangname,
    '<br><b>Auslastung</b>:', Ausl_Kat,
    '<br><b>Besetzung</b>:', round(Besetzung, 2),
    '<extra></extra>'
  ),
  legend_col = legend_col
)

You can also remove the interactivity (hiding traces by clicking on the legend); this could make sense if you want to have a better performance with lots of data or many traces.

catmaply(
  df,
  x=fahrt_seq,
  x_order = fahrt_seq,
  x_tickangle = 15,
  y = Haltestellenlangname,
  y_order = halt_seq,
  z = Ausl_Kat,
  color_palette = viridis::inferno,
  hover_template = paste(
    '<b>Fahrt Nr.</b>:', fahrt_seq,
    '<br><b>Haltestelle</b>:', Haltestellenlangname,
    '<br><b>Auslastung</b>:', Ausl_Kat,
    '<br><b>Besetzung</b>:', round(Besetzung, 2),
    '<extra></extra>'
  ),
  legend_interactive = F
)

What about hiding the legend all together?

catmaply(
  df,
  x=fahrt_seq,
  x_order = fahrt_seq,
  x_tickangle = 15,
  y = Haltestellenlangname,
  y_order = halt_seq,
  z = Ausl_Kat,
  color_palette = viridis::inferno,
  hover_template = paste(
    '<b>Fahrt Nr.</b>:', fahrt_seq,
    '<br><b>Haltestelle</b>:', Haltestellenlangname,
    '<br><b>Auslastung</b>:', Ausl_Kat,
    '<br><b>Besetzung</b>:', round(Besetzung, 2),
    '<extra></extra>'
  ),
  legend = F
)

Hmm, didn’t we say that we want to show the development over time? Wouldn’t it make sense then, if we could use time in the x axis?

Lets check out how a dynamic x axis can be created if you put a column of type PSIXct or POSIXt on the x axis. Lets check it out by calculating the departure date of each drive.

df <- df %>%
  dplyr::mutate(
    FZ_AB = lubridate::ymd_hms(paste("2020-06-03", !!rlang::sym('FZ_AB')))
  ) %>%
  dplyr::group_by(
    !!rlang::sym('fahrt_seq')
  ) %>%
  dplyr::mutate(
    departure = min(!!rlang::sym('FZ_AB'))
  ) %>%
  dplyr::ungroup()

catmaply(
  df,
  x=departure,
  y = Haltestellenlangname,
  y_order = halt_seq,
  z = Besetzung,
  categorical_colorbar = T,
  categorical_col = Ausl_Kat,
  color_palette = viridis::inferno,
  hover_template = paste(
    '<b>Fahrt Nr.</b>:', fahrt_seq,
    '<br><b>Haltestelle</b>:', Haltestellenlangname,
    '<br><b>Auslastung</b>:', Ausl_Kat,
    '<br><b>Besetzung</b>:', round(Besetzung, 2),
    '<extra></extra>'
  )
)

Currently, formatting of the time axis is optimised to analyse daily data; e.g. if you summarize the statistics of utilization sampled througout the year and then summarise it to get the utilization of a typical day. Thus, the formatting of the max zoom level is still hours and not years. However, You can change the individual formatting of the respective zoom level by setting the tickformatstops parameter. So, if you want to e.g. remove the h, m, s and ms that indicate the unit of time above, you could achieve this as follows (more infos can be found in the tick formatting example of ploty:


catmaply(
  df,
  x=departure,
  y = Haltestellenlangname,
  y_order = halt_seq,
  z = Besetzung,
  categorical_colorbar = T,
  categorical_col = Ausl_Kat,
  color_palette = viridis::inferno,
  hover_template = paste(
    '<b>Fahrt Nr.</b>:', fahrt_seq,
    '<br><b>Haltestelle</b>:', Haltestellenlangname,
    '<br><b>Auslastung</b>:', Ausl_Kat,
    '<br><b>Besetzung</b>:', round(Besetzung, 2),
    '<extra></extra>'
  ),
  tickformatstops=list(
    list(dtickrange = list(NULL, 1000), value = "%H:%M:%S.%L"),
    list(dtickrange = list(1000, 60000), value = "%H:%M:%S"),
    list(dtickrange = list(60000, 3600000), value = "%H:%M"),
    list(dtickrange = list(3600000, 86400000), value = "%H:%M"),
    list(dtickrange = list(86400000, 604800000), value = "%H:%M"),
    list(dtickrange = list(604800000, "M1"), value = "%H:%M"),
    list(dtickrange = list("M1", "M12"), value = "%H:%M"),
    list(dtickrange = list("M12", NULL), value = "%H:%M")
  )
)